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ABSTRACT* 

Codebook Excited Linear Prediction [1] is a 
popular analysis by synthesis technique for quantiz- 
ing speech at bit rates from 4 to 16 kbps. Codebook 
design techniques to date have been largely based 
on either random (often gaussian) codebooks, or on 
known binary or ternary codes which efficiently map 
the space of (assumed white) excitation codevectors. 
It has been shown that by introducing symmetries 
into the codebook, good complexity reduction can 
be realized with only marginal decrease in perfor- 
mance. In this papers we consider codebook design 
algorithms for a wide range of structured codebooks. 

INTRODUCTION 

This paper considers CELP codebook design al- 
gorithms for a variety of structured codebooks. A 
structured codebook has certain properties which en- 
able it to be searched faster than unstructured code- 
books. The design algorithms are applied to CELP 
coders, but are sufficiently general to be applied to 
other distortion measures as well. 

Consider the CELP analysis structure shown in 
Figure 1. The long term (quantized) inverse filter 
(with 2^ + 1 non-zero taps), B(z), for subframe n 
is given by: 

B(z) = l-^ hz~ (M+k) (1) 

k=-q 

and the short term (quantized) inverse filter (order 
p), A{z), for subframe n is given by: 

p 

A(z) = (2) 

1 


* This work has been sponsored by the Telecommunications 
Research Institute of Ontario (TRIO). 


The perceptual weighting filter, which attempts to 
obtain a larger signal to noise ratio in inter-formant 
regions is given by: 


W{z) 


Mm 

a p i z h) 


(3) 


where 7 and j3 are optimized based on subjective 
measures, and A p (z) is the optimum unquantized 
inverse filter (for subframe n). 



Figure 1: CELP Search Procedure. The codebook 
dimension, or subframe size is K c . The index 
n is over all subframes, and the index k is over 
all elements of a particular subframe. Thus, 
s n> k is the k ih element of the n th subframe. 

Typically, A p (z) is determined to minimize the 
open loop residual energy, and B(z) is determined 
(closed loop) to minimize the noise weighted error 
before determination of the codebook excitation (the 
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energy in y n tk ). The determination of these param- 
eters and complexity reduction techniques based on 
the structured codebooks is beyond the scope of this 
paper. The interested reader is referred to [2, 3, 4, 
5, 6], Overlap is often used to reduce block coding 
edge effects. That is, components of the excitation 
vector near the end of a subframe have little effect on 
the current subframe, but may adversely affect future 
subframes. Overlap considers the influence of these 
elements by letting the filters ring for I< 0 samples 
after the last sample in the excitation vector. 

The weighted mean squared error for a particular 
codebook index / over a subframe (at subframe index 
n ) of dimension I( c with overlap K 0 is given by: 


= YnYn - 2G<MH n v< ; > 

+G£)V / > 7 ’H£H n v< / > (4) 

where the K c + I ( 0 by K c dimensional lower trian- 
gular Toeplitz matrix H„ represents the zero state 
filtering operation (of W (z) j A(z)). The I th excita- 
tion (column) vector v(9 is of dimension K c , and the 
(column) vectors y n and G^H„v^j are of 

dimension I<{— K c + I( 0 ). 

The codebook design algorithms are all based 
on the Generalized Lloyd Algorithm (GLA) [7, 8, 9] 
and require a sufficiently rich training sequence to 
design the codebook. Due to the long memory in 
1 fB(z), the algorithm is not guaranteed to converge 
to a local minimum. That is, the set of training 
vectors T = {y n } changes from one iteration to the 
next. The problem arises because (for simplicity) 
we assume the training vectors do not depend on the 
codevectors. Due to the long and short term predictor 
memory this is not the case. In practice, convergence 
is similar to the GLA, although the average weighted 
mean squared error has been observed to increase 
(slightly) after some iterations. 

The optimum codebook is defined as that which 
minimizes Equation 4 over the whole training se- 
quence. We minimize: 

N— 1 

71 = 0 

= E (yjyn - 2G n yjH n v('"> 

n = 0 

-K#v( ,m)T HjH n v('»)) (5) 

The index n is over all training vectors (y n ), l n 
is the optimum codebook index for training vector 


(or subframe) n, v* ,n * is the optimum codevector 
(for subframe n) and G„ is the optimum gain for 

codevector ( G n = Gn n b- The codebook design 

techniques are all based on minimization of Equation 
5. All design techniques assume training vector y n 
is not a function of the current, or past codevectors. 

In Section 2 we consider general codebook de- 
sign. The codebook is given by L c distinct I( c di- 
mensional codevectors. This section also considers 
codebooks in which the codevectors have many zero 
elements. 

GENERAL CODEBOOK DESIGN 

We now discuss techniques whereby near opti- 
mal codebooks may be design for general, or sparse 
codebooks. The technique is based on a vector quan- 
tizer design algorithm using the noise weighted mean 
squared error distortion measure. Due to the influ- 
ence of previous codevectors on future codevectors 
(via the long term predictor memory), only subop- 
timal codebooks may be designed, (the error is not 
guaranteed to decrease continually to a local opti- 
mum). In practice, the average distortion usually de- 
creases until a local optimum is found, then oscillates 
slowly in the vicinity of that local optimum. 

Unstructured Codebooks 

The goal is to minimize Equation 5 over all pos- 
sible codebooks of size L c and dimension K c . Given 
a training sequence of N speech vectors S = {s n }, 
and an initial codebook = {v^}, we analyze 
the vectors using the CELP structure to obtain the 
training set = {y n }. Essentially, we use the 
initial codebook to partition the training sequence 
(T(°)) into L c cells, or regions 72 according to the 
nearest neighbour search, and compute new centroids 
(or codevectors) for the regions. Cell j is comprised 
of those subframes which have (the op- 

timum codebook index at time n is j). Equation 5 
can then be split up into L c terms, one term for each 
particular cell: 

?= E |yn-G n H n v(0>| 2 + 

E |yn-G n H n v (,) | 2 + ...+ 

E |y„ G n H n v^ — ^| 2 (6) 

neiz^c- » 

where the summation indicates summation over the 
region in which all codevectors are identical. Mini- 
mization of Equation 6 is equivalent to minimizing 
each term, since a particular codevector only influ- 
ences the summation in its region. Furthermore, in 


668 


International Mobile Satellite Conference, Ottawa, 1990 



each region (j), does not depend on n (since 
j = /„. Thus we minimize (with respect to v^): 

< (i) = E (y»yn - 2G n y^H n vW) 

+G 3 n vW T H^H n vW) (7) 

for each region j, 0 < j < L c . Since v( , "l = v^' 1 is 
a constant for each region and does not depend on 
the index n, we may write: 


We then iteratively add another pulse location, 
and so on, until we have the desired number of non- 
zero pulses in the codevector. After each iteration, 
the pulse amplitudes are re-optimized. 

To minimize: 

e 0') = ct£ j ) 2 -2c^ t v ( j) + (13) 

for the first pulse position ( k 0 ) and amplitude (v^J) 
we minimize: 


= E yly» - 2 ( E vW 

n£lU>) Vn67ZO) / 

+ vW T ( E G n H n H n) ^ 

VnGftO) / 

= d^ 2 — 2c^v^ + y0) T ^0)yU) (8) 


C (M = (14) 

which has solution (for a particular position k 0 ): 


=U) 

„ 0 ) _ 0 


*° dO) 

n k 0 k 0 


05) 


where 


> 2 = E y»y» 

(9) 



g0) = E ^ H nyn 

(10) 

ntllM 


RW= G - H n H - 

01) 

ne 



It can easily be shown (differentiate with respect 
to v^l), that to minimize equation 8 we choose: 


The first position is computed by trying all locations, 
and choosing that which minimizes Equation 14. 

Assuming the first pulse location is fixed, the 
second location is chosen to minimize: 


€ b.2) = a^ 2 -2c[ j) vi j) -2c{ j) v{ j) + v{ j) 


+ co 


U)dU) Jj) 


If vf?) is not to be modified (as part of the search, 
for complexity reasons), then: 


v (j ) = 1 c (j) (12) 

which can be efficiently accomplished by using 
Choleski decomposition. This is performed over all 
j, 0 < j < L c . 

We will now have a new codebook (Cl 1 )), which 
can be used in the CELP analysis structure to obtain 
the training set T^\ Unlike typical VQ design 
techniques, the training set will not be the same 
as Tl°l. The above design algorithm is just a simple 
extension of the GLA for a CELP type distortion 
based on the above assumptions. 

Sparse Codebook Design 

To design sparse codebooks, we essentially want 
to minimize Equation 8 (for each j ), given the con- 
straint that there are a large number of zero values 
in the codevectors. We use the multipulse sequential 
approach (for complexity reasons), and first compute 
the optimum pulse location and gain (assuming one 
non-zero value in the codevector) to minimize Equa- 
tion 8. 


U) _ th. 


nO) _ ,y0)^j>0) 


% = 


kp ~~kiko 




(17) 


and: 


= a] - 




pD 


R 


(18) 


*.*! 


The mean squared error is minimized by maximiz- 
ing the square of the second term. At the end of the 
search for the second pulse position, the amplitudes 
of the first and second pulse positions can be opti- 
mized by minimizing Equation 16 with respect to the 
unknown amplitudes and t>£^. 

In general, the n th pulse position is given by 
computing the minimum over all pulse locations ( k n ) 
of: 
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and the pulse amplitudes are optimized by solving 
(by Choleski Decomposition): 


yU) 


>> = (rW) 


-1 


c-W 


( 20 ) 


where: 


and: 
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( 21 ) 


(22) 


The sequential multipulse search procedure is in- 
herently suboptimal since it does not try all combina- 
tions of pulse positions. However, a full search tech- 
nique is prohibitively complex. Rather than keep- 
ing a single pulse position from stage to stage how- 
ever, it is clearly better to keep Mi survivors after 
each stage (for the sequential optimization described 
above, Mi = 1). 

SHIFT SYMMETRIC CODEBOOKS 

A shift symmetric codebook is defined as a code- 
book in which a single codevector has all but t el- 
ements in common with the next codevector. The 
I th codevector can be written where C 

is a I( c + t(L c - 1) dimensional column vector (the 
codebook) and is a K c by K c + t(L c - 1) dimen- 
sional shifting matrix with ones on die tl th (upper) 
diagonal, and zeros elsewhere: 


S (0 = [0K c ,tl\lK c ,K e \0fC c ,L e -tl] (23) 

where 0A' c , n is a K c by n matrix of zeros, and Ik c ,k c 
is the K c by K c identity matrix. With t = K c we 
obtain the general codebook discussed above. 

Shift symmetric codebooks present a problem 
since elements from a single codevector are included 
in possibly many other codevectors. Thus, the design 
algorithm must reflect this property. A modification 
to the Vector Quantization (VQ) design algorithm 
was utilized to account for the shift symmetric code- 
books. We have: 


Again we assume we have an initial codebook, 
but rather than partitioning the codebook into L c cells 
or regions using the nearest neighbour, minimum 
distortion search criteria, we simply substitute = 
into Equation 24 which yields: 


€= X (yn-G'nH n S^>C) 2 


= o\- 2c T C + C t RC 

(25) 

where: 

N—l 


°V )2 = X y n y " 

71=0 

(26) 

c = GnS^ T Hly n 

71=0 

(27) 


(a K c + t(L c - 1) dimensional column vector) and: 


R = J2 G n S ( '" )r H^H n S (, '> ) (28) 

n=0 

(a square K c + t(L c - 1) dimensional band matrix). 

The codebook is thus given by C = R -1 c 
which, again, can be efficiently computed using 
Choleski Decomposition. Further storage and com- 
putational savings can be realized by using the fact 
that R is a band matrix. Computation of Equation 
28 and 27 can be greatly simplified by exploiting the 
structure in the shifting matrix. 

Sparse shift symmetric codebooks can be de- 
signed by applying a multipulse procedure to Equa- 
tion 25, as was done with general sparse codebooks. 

VSELP CODEBOOK DESIGN 

Let L c = 2 m , where M is the number of bits 
in the codebook index. The VSELP excitation can 
be given by = Cb^ where C is the VSELP 
codebook (a I( c by M dimensional matrix), and 
(an M dimensional column vector with elements ± 1) 
is the I th codeword. Alternatively, yet equivalently, 
the excitation can be written as where 

C is a K C M dimensional column vector (containing 
the stacked columns of C) and is a K c by 
K C M dimensional Toeplitz matrix, with the first row 
having elements 6^ in positions B 0 ,kK c 

Over the training sequence, we may write: 


N - 1 


e = X ( yn ~ G " H " v(,n) ) 


(24) 


N - 1 


71=0 


= X ( y " ~ G „ H n v (,n) ) 


(29) 


71=0 
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Substituting = B into 29 leads to: 
e = £ (yn-GnH^Bf'^c ) 2 

n= 0 

= - 2c t C + C t RT (30) 

where: 

*?' )2 = ]C y « yn (3I) 

n=0 

C = £ G n B ( '" )T H£y n (32) 

71 = 0 

(a A' C M dimensional column vector) and: 

R = 2 (33) 

n=0 

is a K C M by K C M dimensional matrix. 

The VSELP (stacked) codebook C is computed 
by solving (again by Choleski Decomposition): 

C = R _1 c (34) 


Computation of Equation 32 and 33 can be 
greatly simplified by exploiting the structure of B( /n \ 

RESULTS 

In this section we present results of computer 
simulations conducted on a 10 minute speech data- 
base and a 30 second speech database. Codebooks 
were trained on the large database and the perfor- 
mance was computed on both databases. Objective 
measures of performance included the segmental sig- 
nal to noise ratio defined by: 

SEGSNR = ± £ 101og 10 ( --M r-a ) (35) 

n \l s n Sn l / 

where 

Sn (36) 

is the synthesiszed (20 msec) speech vector and the 
noise weighted signal to noise ratio defined by: 

£s r s 

NWSNR = -2—— (37) 


In our examples, the CELP coder used sub- 
frame dimensions of 40 samples, 2 samples of over- 
lap (which was determined to be near optimal), and 


frame sizes of 160 samples. The inverse filter ( A(z )) 
was determined at the frame rate using the autocor- 
relation method and quantized using interframe vec- 
tor linear prediction of the line spectrum pairs fol- 
lowed by scalar quantization of the error [10]. The 
long term predictor was optimized closed loop to 
minimize the closed loop weighted mean squared er- 
ror. The pitch period was constrained to be in the 
range from 21 to 148 samples. The general code- 
books used the autocorrelation method discussed in 
[2] (which does contain certain approximations). Our 
experiments with shift symmetric codebooks consid- 
ered t = 1 only, (and no approximations were used). 
The design of the sparse codebooks used the tree 
searched multipulse search procedure outlined above, 
with Mi = 128. The sparse shift symmetric code- 
books had more than 90% zero samples (52 non- zero 
samples in a 512 level codebook). 

Table 1 displays the performance of random 
gaussian codebooks for various codebook sizes ( L c ). 


Codebook Size (bits) 

NWSNR (SEGSNR) | 

7 

14.39 (17.13) dB 

8 

14.84 (17.75) dB 

9 

15.24 (18.21) dB 

10 

15.59 (18.66) dB 

11 

16.10(19.26) dB 


Table 1 : Performance of random gaussian codebooks 
of various sizes (30 second database). The values are 
accurate (with 95% confidence) to within 0.1 dB. 


By comparison, a 9 bit random gaussian shift 
symmetric codebook obtained a noise weighted SNR 
of 15.05 dB (SEGSNR=18.11 dB) and a 9 bit random 
VSELP codebook obtained a NWSNR of 13.92 dB 
(SEGSNR=16.83 dB). Again the values are accurate 
(with 95% confidence) to within 0.1 dB. 

Trained 9 bit general codebooks, sparse shift 
symmetric, and VSELP codebooks (using the design 
techniques discussed above) obtained performance 
both inside and outside of the training sequence as 
shown in Tables 2 and 3. 

Outside the training sequence the performance 
(NWSNR) of sparse shift symmetric codebooks is 
within 0.2 dB of the general codebooks which is 
within the 95% confidence intervals. Inside the train- 
ing sequence the performance of the general code- 
book is approximately 0.7 dB better than the sparse 
shift symmetric codebooks. Imposing structure lim- 
its the performance inside the training sequence but 
has little effect outside the training sequence in this 
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Codebook 

NWSNR (SEGSNR) 

General 

14.67 (19.40) dB 

Sparse Shift Symmetric 

13.95 (18.84) dB 

VSELP 

13.43 (18.54) dB 


Table 2: Performance of trained general, sparse shift 
symmetric, and VSELP codebooks inside the training 
sequence (10 minute database). The values are 
accurate (with 95% confidence) to within 0.1 dB. 


Codebook 

NWSNR (SEGSNR) 

General 

15.75 (18.79) dB 

Sparse Shift Symmetric 

15.60 (18.48) dB 

VSELP 

14.72 (17.84) dB 


Table 3: Performance of trained general, sparse shift 
symmetric, and VSELP codebooks outside the training 
sequence (30 second database). The values are 
accurate (with 95% confidence) to within 0.1 dB. 

instance. The VSELP codebooks appear to have too 
much structure and performance suffers by more than 
0.8 dB both inside and outside the training sequence. 
However, the performance of VSELP improves by 
more then 0.8 dB after codebook design (outside the 
training sequence). 

The general and sparse shift symmetric trained 9 
bit codebooks have objective performance virtually 
equivalent to the untrained 10 bit random codebooks. 
Thus, for equivalent objective performance half the 
number of levels in the codebook are required, re- 
sulting in a lower data rate and a lower complexity. 

CONCLUSIONS 

This paper considered the CELP codebook de- 
sign problem for a variety of structured codebooks. 
It was detemiined that a savings of one bit per vec- 
tor could be realized with virtually no decrease in 
the objective measures while decreasing complex- 
ity by a factor of two. For fixed codebook sizes, 
improvements of more than 0.5 dB were observed 
with no increase in computational complexity. It was 
observed that Vector Sum Excited Linear Prediction 
had too much structure, and performance was notice- 
ably inferior to the general or sparse shift symmetric 
codebooks. 


The structured codebook design techniques are 
relatively simple, and only require Choleski Decom- 
position or a relatively straightforward multipulse 
algorithm. The design algorithms were applied to 
CELP coders, but are sufficiently general to be ap- 
plied to other distortion measures as well. 
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